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Abstract 

In the context of multilevel longitudinal data, where sample units are collected 
in clusters, an important aspect that should be accounted for is the unobserved 
heterogeneity between sample units and between clusters. For this aim we propose 
an approach based on nested hidden (latent) Markov chains, which are associated 
to every sample unit and to every cluster. The approach allows us to account for the 
mentioned forms of unobserved heterogeneity in a dynamic fashion; it also allows 
us to account for the correlation which may arise between the responses provided 
by the units belonging to the same cluster. Given the complexity in computing 
the manifest distribution of these response variables, we make inference on the 
proposed model through a composite likelihood function based on all the possible 
pairs of subjects within every cluster. The proposed approach is illustrated through 
an application to a dataset concerning a sample of Italian workers in which a binary 
response variable for the worker receiving an illness benefit was repeatedly observed. 

Keywords: composite likelihood, EM algorithm, latent Markov model, pairwise 
likelihood 
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1 Introduction 



In modeling longitudinal data, it is common to account for the unobserved heterogeneity 
between sample units, that is, the heterogeneity that cannot be explained on the basis of 
the observable covariates (Diggle et al., 2002; Hsiao, 2003; Frees, 2004; Fitzmaurice et al., 
2009). This is normally accomplished by the introduction of latent variables or random 
effects. For instance, a typical approach consists of associating a random intercept to every 
sample unit which affects the distribution of each occasion-specific response in the same 
fashion. This allows us to account for a form of time- constant unobserved heterogeneity 
which is due to unobservable covariates and related factors. 

More recent approaches for longitudinal data are based on allowing for a form of time- 
varying unobserved heterogeneity, relaxing in this way the assumption that the effect of 
unobservable covariates on the response variables is constant in time. This is sensible in 
many applied contexts, especially in the presence of long panels and with a limited set 
of observable covariates. Among these time-varying approaches, it is worth mentioning 
the one described in Heiss (2008), which is based on random effects having an AR(1) 
structure, and that proposed by Bartolucci and Farcomeni (2009), which is based on a 
hidden (latent) Markov chains for capturing the unobserved heterogeneity in a dynamic 
fashion. For a comparison between the two approaches see Bartolucci et al. (2010a). 

The above considerations are obviously pertinent when we deal with multilevel longi- 
tudinal data, where sample units are collected in clusters, with the addiction that it is also 
appropriate modeling the unobserved heterogeneity between clusters and the correlation 
between the responses provided by the units in the same cluster. Note that multilevel 
longitudinal data are more and more easily encountered in socio-economic contexts. In 
particular, the dataset motivating this paper, that will be described in detail in the fol- 
lowing, concerns a sample of workers (sample units) in different firms (clusters), who are 
longitudinally observed. As response we have a binary variable equal to 1 if the employee 
receives illness benefits in a certain year and to otherwise. Datasets having a similar 
structure are nowadays available, for instance, in educational contexts, where students 
are collected in classes and are followed for a certain number of years of schooling. In 



2 



these datasets we typically have a limited set of observable covariates and the need arises 
for an appropriate modeling of the unobserved heterogeneity between both sample units 
and clusters. 

For the aim described above, we propose an approach based on nested hidden Markov 
chains which may be seen as an extension of the approach proposed by Bartolucci and 
Farcomeni (2009) for longitudinal data. In particular, we associate a first-order homoge- 
neous hidden Markov chain to every sample unit and to every cluster. The time-specific 
realizations of these two chains go to affect the distribution of the response variables to- 
gether with the covariates observed at unit and cluster levels. Coming back to the above 
example about the sample of employees, the different states of the unit-level Markov chain 
correspond to different levels of the residual (not explained by the unit-level observable 
covariates) tendency to require an illness benefit by an employee. A similar interpretation 
may be found for the different states of the cluster-level Markov chain, which affect the 
behavior of the employees in the same firm. Moreover, the possibility that the unit-level 
state changes may be due to events of the employee's life that are not recorded in the 
dataset, such as a sudden worsening of his/her health status. Similarly, a change in the 
cluster-level state may be due to events about the firm, such as the change of the man- 
agement. In any case, we can test if the latent effects are indeed dynamic or not on the 
basis of the dataset at hand. 

The proposed approach may be cast in the literature about latent Markov (LM) models 
for longitudinal data, as described by Bartolucci et al. (2010b). It is worth noting that 
other multilevel extensions of the latent (or hidden) Markov approach for longitudinal 
data are available in the literature. We mention, in particular, the extensions proposed 
by Bartolucci et al. (2009) and Bartolucci et al. (2011). About multilevel extensions 
see also Asparouhov and Muthen (2008) and about related models including random 
effects, but not in a context of analysis of multilevel data, see van de Pol and Langeheine 
(1990), Altman (2007), and Maruotti (2011). In these cases the effects (fixed or random) 
associated to every cluster are time-constant. However, an extension in which these effects 
are time-varying has not been proposed yet, at least to our knowledge. 

Under the proposed model, the manifest distribution of the response variables is com- 
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putationally intractable in most applications. Therefore, to make inference on the model 
we exploit an approach based on a composite likelihood (Lindsay, 1988; Cox and Reid, 
2004), which is computed on the basis of the joint distribution of the response variables 
for each pair of subjects in the same cluster. A similar approach was followed by Renard 
et al. (2004) to deal with a multilevel probit model; for applications of this inferential 
approach to similar contexts, see Hjort and Varin (2008) and Varin and Czado (2010). In 
particular, we show how to compute the pairwise likelihood by using the same recursion 
exploited by Baum et al. (1970) to deal with hidden Markov models and how to maximize 
this likelihood by an Expectation-Maximization (EM) algorithm similar to the one they 
suggest and implemented along the same lines as in Bartolucci and Farcomeni (2009). We 
also show how to obtain standard errors for the parameter estimates and how to make 
model selection on the basis of the composite likelihood information criterion (CLIC) de- 
veloped by Varin and Vidoni (2005). An R implementation of the functions used for the 
estimation of the model in the presence of binary response variables is available to the 
reader upon request. 

The paper is organized as follows. In the next section we briefly review the LM model 
with covariates (Bartolucci and Farcomeni, 2009) and its maximum likelihood estimation. 
Section 3 illustrates the proposed multilevel extension dealing with the case of continuous 
and binary response variables. Pairwise likelihood inference for this model is described 
in Section 4. In Section 5 we illustrate the model by an application based on the dataset 
concerning the sample of workers mentioned above. Finally, in Section 6 we draw the 
main conclusions. 



2 Using hidden Markov chains for modeling unob- 
served heterogeneity 

Consider a panel of n subjects observed at T occasions and let denote the response 
variable of interest for subject i at occasion t, i — 1, . . . , n, t — 1, . . . , T, and let zf^ be the 
corresponding column vector of covariates, which may also include the lagged responses. 
In the context of our application, the response variables are binary, although the LM 
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model may be also applied to variables having a different nature. 

In the following, we outline how to model these data accounting for unobserved het- 
erogeneity in a dynamic fashion, by introducing a hidden Markov chain, as suggested by 
Bartolucci and Farcomeni (2009). 

2.1 Model assumptions 

(1) (T) 

We assume that, for i = l,...,n, the response variables Y i , . . . , Y { are condition- 

(1) (T) 

ally independent given the covariate vectors Z\ , . . . , Z\ and a latent process V \ = 
[V i , . . . , Vi ), which follows a first-order homogeneous Markov chain and is indepen- 
dent of the covariates. 

This chain has k states, labeled from 1 to k, with initial and transition probabilities 

k v = p(vP=v), v = l,...,k, 
n v \v = p(Vi = v\V^~ l) = v), t = 2, . . . ,T, v,v = 1, . . .,k. 

Note that, in the above definitions, v refers to the current state, whereas v refers to the 
previous one. This convention will be used throughout the paper. Moreover, the initial 
probabilities are collected in the fc-dimensional column vectors 7r, whereas the transition 
probabilities are collected in the k x k transition matrix II. Note that these probabilities 
are the same for all sample units and, in particular, the transition probabilities are time 
homogenous. Moreover, in order to make the model more parsimonious, different con- 
straints may be imposed on the matrix II; see also Bartolucci (2006). For instance, we 
may assume that this matrix is tridiagonal, with constant off-diagonal elements, so that 
with fc = 3we have 

n- P P o \ 
n= P i-2 P P , (i) 
V o P i-p) 

where p is a parameter between and 0.5 to be estimated. 

For subject % at occasion t, the latent variable corresponds to the level of the 
unobservable characteristic of interest. The way in which this characteristic affects the 
corresponding response variable depends on the assumed measurement model. For in- 
stance, in the case of continuous response variables, it is natural to formulate the following 



assumption on the conditional distribution of Y { given VP and zf: 

YPIVP = v, zf = z~ N(/3 V + z'S, a 2 ), 

where f3 v is an intercept related to the latent state and 8 is a vector of regression coeffi- 
cients. Obviously, these parameters, including the variance a 2 , can be estimated together 
with the above initial and transition probabilities. 

With binary response variables, instead, it is natural to assume that 

= Vi zf = z~ Bern(ipf(v, z)), 

where 

log i 7Wt z^^ + zS, 

with ipf ( v i z ) corresponding to the conditional "probability of success" , that is ip® (v, z) = 

The above approach may be extended to response variables having a different nature, 
even ordinal variables, and also to multivariate contexts, in which we observe more re- 
sponse variables at each time occasions. We refer the reader to Bartolucci and Farcomeni 
(2009) for details on the resulting LM model. 

2.2 Maximum likelihood estimation 

When we deal with an observed sample, for % — 1, . . . , n we have an observed response con- 
figuration y i = (y| , . . . , Vi) and an observed sequence of covariates vectors z[ , . . . , zf ; 
we collect these covariates in the unique vector Zi (for all time occasion). In order to per- 
form maximum likelihood estimation of the above model on the basis of these data, the 
need arises of computing the manifest distribution of y i given z^ that is, 

P{Vi\*i) = Y.P^i\ V i = v > z i)v(Yi = v), (2) 

V 

(1) (T) 

where the sum J2v is over an the possible configurations v = (v^ , . . . , v\ ) of the latent 
process Vi. 

Efficient computation of the probability in (2) may be performed by exploiting a for- 
ward recursion available in the hidden Markov literature (see Baum et al., 1970; Levinson 
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qf! = (3) 



et al., 1983; MacDonald and Zucchini, 1997). As in Bartolucci (2006) and Bartolucci and 
Farcomeni (2009), it is convenient to express this recursion by using the matrix notation 
on the basis of the initial probability vectors 7r and transition matrix II. For this aim, 
consider the column vector qf^ with elements 

p{yf\ yf\ = v, z?\ . . . , zf } ), v = l,...,k. 

This vector may be recursively computed as follows: 

diag(m J - 1 ' ) )7r, if t = 1, 

diag(mf' ) )II / qf X \ otherwise, 

where mf ^ is the column vector with elements p{yf'\V^ = v, zf^), for v = 1, . . . , k, which 
is defined on the basis of the assumed measurement model. Once this recursion has been 
performed for t — 1, . . . , T, we may obtain p(yj) as the sum of the elements of the vector 

(T) 

Qi ■ 

Maximum likelihood estimation is performed by maximizing the log-likelihood £(0) = 
Y^i^og[p(y,i\zi)], where denotes the vector of all model parameters. We maximize this 
function by an EM algorithm (Baum et al., 1970; Dempster et al., 1977), which is based 
on the complete data log-likelihood denoted by £*(9), that is, the log-likelihood that we 
could compute if we knew the latent state of each subject at every occasion. 

The EM algorithm alternates two steps (E and M) until convergence: the i?-step 
computes the conditional expectation of £*(0), given the observed data and the current 
value of 0, using recursions similar to the one illustrated above; the M-step maximizes 
this expected value with respect to 6, so that this parameter vector results updated. 
The latter may require simple iterative algorithms of Newton-Raphson type. A detailed 
description of this EM algorithm is available in Bartolucci and Farcomeni (2009). 



3 Proposed multilevel extension 

In the context of multilevel longitudinal data, the n sample units are grouped, according 
to some criteria, in H clusters of size ni, . . . ,nu- Then, for each subject i in cluster 
h, data are available at T consecutive occasions. In particular, we denote by Y^f the 
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corresponding response variable and by Z hi the corresponding column vector of covariates, 
where h = 1, . . . , H, % = 1, . . . , nh, and t = 1, . . . , T. Moreover, by X®, with h = 1, . . . , H 
and t = 1, . . . , T, we denote column vectors of cluster-level covariates, which may be time- 
varying. 

In the following we show how multilevel longitudinal data, having the structure de- 
scribed above, may be analyzed by an extension of the approach outlined in Section 2. 

3.1 Model assumptions 

(D (T) 

Our extension assumes the existence of a latent process U h = (U£ J ,...,U£') for each 
cluster h, h = 1, . . . , H, and a latent process V hi = (V hi , . . . , V hi ) for each subject i, 
i = l,...,n/i, in the cluster. Both processes follow a first-order homogeneous Markov 
chain with ki states at cluster level and hi at individual level. These processes are 
assumed to be independent each other and also independent of the unit- and cluster-level 
covariates. Moreover, extending the assumptions formulated in Section 2, we impose that, 
for every sample unit hi (unit i in cluster h), the response variables are conditionally 
independent given Uh, Vhi and the corresponding covariates. This implies that the 
response vectors for two subjects in the same cluster are conditionally independent given 
Uh, but they are not marginally independent. This marginal independence holds for 
subjects belonging to two different clusters. 

The initial and the transition probabilities of each cluster-level latent process are 
denoted by 

K = p(U^ ] =u), u = l,...,k 1 
K\u = p(U^ = u\U^~ x) = u), t = 2,...,T, u,u = 

and are collected in the vector A and in the transition matrix A. Moreover, for the unit- 
level latent processes Uhi, we substantially adopt the same notation as in Section 2, and 
then we let ir v = p(V h ^ = v) and 7r„| C = p(V® = v\V® = v); these initial and transition 
probabilities are still collected in the vector tt and in the matrix n, respectively. 

Finally, about the conditional response probabilities, the same considerations ex- 
pressed in Section 2 still holds. Then, in the case of continuous response variables we 
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may assume that: 

Y^IUP = u, v£f = v, X h t] = x, Z<£} = z~ N(a u + f3 v + x'7 + z'S, a 2 ), 

where a u is an intercept related to the cluster-level latent state, /3 V is an intercepts re- 
lated to the unit-level latent state, and 7 and <5 are corresponding vectors of regression 
coefficients. 

With binary response variables, instead, it is natural to assume that 

YfflvP = u, V$ = v, Xf = x, zjg = z~ Berntyjg (u, v, x,z)), (4) 

where 

log ^^1 V > X > Z) =a u + (3 v + z'7 + z'S, 
1 - ( u i v i x > z ) 
with parameters having the same interpretation as above. 

3.2 Manifest distribution 

When we observe a set of multilevel longitudinal data, we have a sequence of response 
Vhi = \Vhii • • • iVhi) f° r every sample unit hi, with h = 1, . . . , H, i = 1, . . . , n h . We 
denote by y h the vector obtained by collecting the responses of all subjects in cluster 
h, that is yfl for i = 1, . . . ,rih and t = 1, . . . ,T. Similarly, we observe the vectors of 

(t) (T) 

unit-level covariates z h (, . . . , z hi ; these covariates are collected in the unique vector zu 
when referred to the unit hi (for all time occasions) and in the vector z^ when referred 
to all units in the same cluster h. Finally, for every cluster h, we observe the vectors 
of cluster- level covariates which are collected in the unique vector x h (for all time 
occasions). 

Under the above assumptions, the manifest probability of y h given Xh and Zh has the 
following expression: 



p(y h \x h ,z h ) = J2p( U h = u ) 



u 

r 

X I YjP(Vhi\ U h = U, V hi = V, X h , Z h )p(V hi = V) 



i=l 



V 



where the sum J2u i s over & U the possible configurations of the latent process V \ and J2v 
is over all the possible configurations of V hi- 



For the cases in which computing p(y h \x h , z h ) is feasible, estimation of the model pa- 
rameters can be performed by maximizing the log-likelihood £(0) = ^ h \og\p{y h \xf l1 Zh)\. 
However, computation of p(y h \xh, zu) is usually infeasible even if the conditional prob- 
ability p(y hi \Uh = u,Vhi = v,Xf L ,Zh) is obtained by recursion (3). For this reason, we 
suggest below a pairwise likelihood based approach. 

4 Pairwise likelihood inference 

In order to make inference on the model parameters, we exploit the following pairwise 
log-likelihood: 

H n h -l n h 

pW = E E E Me), 

h=l i=l j=i+l 

pthij(O) = log\p(y hi ,y hj \x h ,z hi ,z h j)], 

which recalls the pairwise log-likelihood used by Renard et al. (2004). 

Note that, when the dimension of each cluster is two (n^ = 2, h = 1, . . . , H), this func- 
tion is the exact log-likelihood of the model, since it is based on the manifest probability 
of the responses provided by all the possible pairs of subjects in the same cluster. 

4.1 Computation and maximization of the pairwise likelihood 

In order to efficiently compute the probability p(y hi , y h j\x h , z hi , z h j) as a function of the 
parameters in 6, we exploit recursion (3) already used for the model illustrated in Section 
2. In fact, we have that 

PiVhi, Vhj) = viVhip • • • > VMj\ x h, z hi, Zhj), 

where y^ is a realization of the vector = (Y^f , Y^f)'. It may be simply proved that, 
for t = 1, . . . ,T, these vectors follow a bivariate LM model with covariates since they 
are conditionally independent given the latent process . . . , W>J- , where W^?- = 

(U® , V® , V®), and the corresponding covariates. In particular, this latent process fol- 
lows a Markov chain with an augmented space of k = k x k\ states indexed by w = 
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(w, t>i, t>2). It is simple to see that the initial probability of state w is 

<Pw = P(Wfy = W) = X U 7T V1 7T V2 , (5) 

whereas, for t — 2, . . . , T, transition probability from state w = (u, vi, v 2 ) to w is 

<I>W\W = P(W%1 = WIW^ = W) = Xu\u^vx\v^v 2 \v 2 - (6) 

Moreover, in the case of discrete or categorical response variables, the model assumptions 
imply that, given W hi = w, the conditional probability of j/j^- is equal to 

p(Vhij\W ( h} = w,x h ,z hi ,z hj ) = p{y h t }\u,v 1 ,x h t \z h ^)p{yf j \u,v 2 ,x h t \z h t )). (7) 

A similar expression holds for continuous response variables, based on the corresponding 
density functions. 

In order to compute p{y hi ,y h< j\x h , z hil z h j), recursion (3) is applied with rrij substi- 
tuted by the vector rh hi j having elements piVhijl^hi = w, x h , z hi , z h j) for all w. Simi- 
larly, 7r must be substituted by the initial probability vector <p with elements <p w and II 
by the transition matrix $ with elements 4> w \w- 

The pairwise log-likelihood p£(0) can be maximized by an EM algorithm having a 
structure that closely recalls that outlined in Section 2.2. In this case, in particular, the 
complete data pairwise log-likelihood is 

H n h -l n h 

pt{o) = E E E ptliM 

h=l 1=1 j=i+l 

where 

w 

+ E E E d hU™i w ) iog(0«,|«,) 

t>l w w 

+ E d hU w ) log[p(yS,-| W$ = w, x h , z hi , z hj )}. (8) 

t W 

In the above expression, d h ^{w) is a dummy variable equal to 1 if, at occasion t, cluster 
h is in latent state u, subject hi is in latent state V\, and subject hj is in latent state v 2 \ 
moreover, we have dj^- (10,10) = d h t ^ 1 \w)d h t ^(w). 
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The complete data pairwise log-likelihood may be simply expressed in terms of the 
parameters of the proposed multilevel model by substituting (5), (6), and (7) in the above 
expression. For instance, the first component becomes the sum over u of 

d { Mj(u) log[A h (u)] + tfujfa v i) log[7r w (ui|«)] + v 2 ) \og[ir hj (v 2 \u)], (9) 

where the variables d^Au), d^j(u,vi), and d^{u, v 2 ) are obtained by summing d^(w) 
over suitable configurations of w. In a similar way we can express the other two compo- 
nents involving the transition and the conditional response probabilities (or densities). 

At the E-step of the EM algorithm, the conditional expected value of each dummy 
variable d^Aw) and d^(w,w) is computed by using the same recursions exploited in 
the algorithm of Baum et al. (1970). At the M-step, the model parameters are updated 
by maximizing the function resulting by substituting the expected values in (8) and ex- 
ploiting the simplification (9) and similar simplifications. In any case, the final algorithm 
is implemented along the same lines as the algorithm implemented by Bartolucci and 
Farcomeni (2009). We make our R implementation available to the reader upon request. 

4.2 Model selection and hypothesis testing 

As in Renard et al. (2004), we estimate the variance-covariance matrix of the pairwise 
likelihood estimator 6, and then obtain standard errors, by the following sandwich formula 

£(0) = J l kj~\ 

where 

&pe h (d) ~ d P £ h (6) d P e h (b) ^ 

J = -?^^' K = ^—e p4(0) = E J>W*)- 

We obtain the first derivative of p£h{6) as a by-product of the EM algorithm. The second 
derivative, instead, is obtained by a numerical method. 

General results on the asymptotic properties of the pairwise likelihood estimator 6 
can be derived along the lines of classical maximum likelihood estimators. However, the 
former is expected to be less efficient since it relies on a restricted amount of information 
(Renard et al., 2004). 
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In order to deal with model selection, Varin and Vidoni (2005) suggested CLIC. Ac- 
cording to this criterion, the model to be selected is the one which maximizes the following 
index 

CLIC = pi{6) - ti(KJ l ). (10) 

We use this criterion to select the number of states k\ and /C2 of each latent process Uh 
at cluster level and V h% at unit level. Moreover, it can be also used for selecting one of 
the possible parametrizations illustrated in Section 3. 

5 Application 

We illustrate the proposed approach by an application based on a dataset on individual 
work histories derived from the administrative archives of the Italian National Institute of 
Social Security (INPS). We consider a sample of 1,876 employees (both blue-collars and 
white-collars) from 249 private Italian firms with 1,000 to 10,000 workers. The subjects, 
continuously working in the same firm and aged between 18 and 60 in 1994, were followed 
for 6 years, from 1994 to 1999. See Bartolucci and Nigro (2007) for further details. 

As already mentioned in Section 1, the binary response variable of interest is illness 
(equal to 1 if the employee received illness benefits in a certain year and to otherwise). 
We also consider a set of unit- and cluster-level covariates: gender (dummy equal to 1 
for woman), age in 1994, area (Noth-West, North-East, Center, South, or Islands), skill 
(dummy equal to 1 for a blue-collar), income (total annual compensation in thousands of 
Euros), and part-time (dummy equal to 1 for a part-time employee). Among the covariates 
we also include the lagged response. 

To this dataset, we fitted the model described in Section 3 under the constraint that the 
transition matrices for both processes are tridiagonal with constant off-diagonal elements; 
see equation (1). We also assume a logistic regression model as in (4) for the conditional 
probabilities. Then, the unit-level latent process is expected to capture the propensity 
(which is not explained by the observed covariates) to get ill of every subject, whereas 
the cluster-level latent process explains the effect of different firms on the propensity to 
require illness benefits. 
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The first step of the analysis is the choice of the number of states for the cluster- and 
unit-level latent processes, denoted by ki and k% respectively. This choice is based on 
CLIC, which is based on the index defined in (10). The value of this index is reported in 
Table 1 for different values of k\ and k 2 . According to these results we select the model 
with k\ = 3 states at cluster level and k 2 = 2 at unit level. 

Table 1: Values of CLIC for different values of k\ and k 2 (in boldface the largest CLIC 
value). 







k 2 






1 


2 


3 


l 


-30724 


-30300 


-29972 


2 


-30144 


-29773 


-29779 


3 


-30018 


-29705 


-29756 


4 


-30001 


-29727 


-29747 



Table 2 collects the estimates of the regression parameters obtained with the selected 
number of states. We note that the probability of receiving illness benefits is positively 
related to being a blue-collar and to the lagged response, whereas it is negatively related 
to income and to having a part-time job. The effects of gender, age and age squared are 
not significant. 

About the distribution of each cluster and unit-level latent process, the estimates of 
the initial and transition probabilities are reported in Table 3 and 4. For both processes, 
we observe that the states are well separated and the second state is the one with the 
highest initial probability. Moreover, the estimates of the transition matrices show that 
the cluster-level latent process has a lower persistence than the unit-level latent process. 

Finally, we tried to simplify the model selected above by restricting the transition 
matrix of each latent process to be diagonal, so that transition between latent states is 
not allowed. In particular, the model in which the transition matrix at cluster-level is 
diagonal has a slightly lower value of CLIC equal to -29,706. On the other hand, the 
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Table 2: Estimates of the logistic regression parameters (collected in the vectors 7 and 8) 
affecting the conditional probabilities. 



parameter 


estimate 




i.e. 


i-stat 


p- value 


intercept 


-3 


.474 


1 


.364 


-2. 


,547 


0.011 


gender 





,161 





,184 


0, 


,876 


0.382 


age 


-0 


,003 





,045 


-0, 


,067 


0.947 


age 2 / 100 





,038 





,060 


0, 


633 


0.527 


area: North-East 





,145 





,257 


0, 


,564 


0.573 


area: Center 


-0 


,096 





.284 


-0, 


338 


0.735 


area: South 


-0 


.427 





.355 


-1, 


203 


0.229 


area: Islands 


-1. 


046 


0, 


,485 


-2, 


,157 


0.031 


skill 


2 


.037 





.423 


4, 


,816 


0.000 


income 


-0 


,200 





,035 


-5, 


,714 


0.000 


part-time 


-0 


.795 





.338 


-2, 


352 


0.019 


lagged-response 





.600 





.172 


3. 


,480 


0.000 



Table 3: Support points and initial and transition probabilities of each cluster-level latent 
process. 



latent 

state (u) 



1 

2 
3 



support 
point (a u ) 



initial 
probability (A u ) 



transition 
probabilities (A u i„) 



0.000 
0.444 
2.931 



0.2221 
0.7181 
0.0598 



0.9130 0.0870 0.0000 
0.0870 0.8260 0.0870 
0.0000 0.0870 0.9130 
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Table 4: 



Support points and initial and transition probabilities of each unit-level latent 



process. 



latent 


support 


initial 


transition 


state (v) 


point (f3 v ) 


probability (ir v ) 


probabilities {n v \v) 


1 


0.000 


0.4122 


0.9729 0.0271 


2 


2.718 


0.5878 


0.0271 0.9729 



restriction that the transition matrix at unit-level is diagonal leads to a strong decrease 
of CLIC, which is equal to -29,757. We then retain the model in which latent transition 
is allowed both at cluster and unit levels. 

6 Conclusions 

With reference to multilevel longitudinal data, where sample units are collected in clus- 
ters, in this paper we propose an approach to account for the unobserved heterogeneity 
between sample units and between clusters in a dynamic fashion. The approach is based 
on associating a hidden (or latent) Markov chain to every sample unit and to every clus- 
ter. These Markov chains are assumed to be homogeneous and of the first-order, with 
transition probabilities that may be subjected to suitable constraints. The approach then 
extends the one proposed by Bartolucci and Farcomeni (2009), who proposed a latent 
Markov model with covariates for longitudinal data (not having a multilevel structure). 

The complexity of the model formulated on the basis of the proposed approach does 
not allow us to make exact likelihood inference on its parameters. Therefore, we adopt 
a composite likelihood framework for making inference, which is based on considering 
all the possible pairs of units in every cluster, as suggested by Renard et al. (2004) in 
a simpler context. Within this framework, we also deal with model selection, based on 
the composite likelihood information criterion (Varin and Vidoni, 2005), and hypothesis 
testing. In an application based on data about a sample of Italian workers who are 
employed in different firms, we observed that this composite likelihood approach gives 
sensible estimates. In this application the response variable is binary, but the approach is 
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completely general in terms of type of response variable, which may be also continuous, 
discrete, or ordinal. 

Possible further developments of the proposed approach may concern the implemen- 
tation of faster algorithms for the maximization of the pairwise likelihood that we use. In 
fact, we maximize this function by an Expectation-Maximization (EM) algorithm which 
is implemented along the same lines as in Bartolucci and Farcomeni (2009). However, we 
think that this maximization may be made much faster by using, after a certain number 
of EM iterations, a Newton- Raphson algorithm. The implementation of this algorithm is 
made possible by the availability of the score and the observed information matrix (for 
the pairwise likelihood function), that we already are able to compute within the present 
approach. 

Finally, another point that deserves attention is the use of alternative forms of com- 
posite likelihood for parameter estimation. In particular, in the current form, the adopted 
pairwise likelihood gives more weight to the data referred to the units belonging clusters 
having a higher dimension. Then, as suggested by Renard et al. (2004), a weighted ver- 
sion of the pairwise log-likelihood may be more suitable when the clusters are strongly 
different in terms of dimension. Note, however, that in our application the clusters are 
not very different in terms of dimension and so, at least in the present case, we do not 
expect to obtain very different results on the basis of a weighted composite likelihood 
function. 
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